The Buckeye corpus of conversational speech: labeling conventions and a test of transcriber reliability
نویسندگان
چکیده
This paper describes the Buckeye corpus of spontaneous American English speech, a 307,000-word corpus containing the speech of 40 talkers from central Ohio, USA. The method used to elicit and record the speech is described, followed by a description of the protocol that was developed to phonemically label what talkers said. The results of a test of labeling consistency are then presented. The corpus will be made available to the scientific community when labeling is completed. 2004 Elsevier B.V. All rights reserved.
منابع مشابه
An analysis of transcription consistency in spontaneous speech from the buckeye corpus
We present a preliminary analysis of transcriber consistency in labeling and segmentation of words and phones in the Buckeye corpus of spontaneous, informal speech. We find that pairwise inter-transcriber agreement on exact phone label match was 76%, and segmentation agreement within 20% of phone pair length was 75%, though longer phones are more consistently segmented than shorter phones. Patt...
متن کاملNaïve listeners’ prominence and boundary perception
This paper examines how ordinary listeners, naïve with respect to the phonetics and phonology of prosody, perceive the location of prosodic boundaries that demarcate speech “chunks” and prominences that serve a “highlighting” function, in spontaneous speech (Buckeye corpus). Over 70 naïve listeners marked the locations of prominences and boundaries in a real-time transcription task. Fleiss’ mul...
متن کاملThe buckeye corpus of speech: updates and enhancements
This paper describes recent progress in the development of the Buckeye Corpus of Speech, a phonetically labeled corpus of conversational American English speech, first described in [1]. With the publication of the second phase of transcription, the corpus has nearly doubled in size from the first release. We briefly give an overview of the corpus, report on additional studies of inter-labeler a...
متن کاملAligning phonetic transcriptions with their citation forms
One of the main motivations for publishing this paper is to make available a matrix of phone-distance measures which may be useful in dealing with large corpora of conversational speech. The paper reports how this matrix of phone-distances was created from transcriber labeling disagreements, and how it can be used in a dynamic time warping algorithm to align phonetic transcriptions of conversat...
متن کاملProsody in a corpus of French spontaneous speech: perception, annotation and prosody ~ syntax interaction
Our study focuses on the issue of prosodic annotation and of the prosody ~ syntax interface in conversation and is based on a large corpus of conversational speech in French. The results of inter-transcriber agreement tests show that two expert transcribers are consistent in their labeling of prosodic phrasing and the consistency is well above the chance. A qualitative analysis reveals transcri...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Speech Communication
دوره 45 شماره
صفحات -
تاریخ انتشار 2005